The Asymptotic Behavior of Undiscounted Value Iteration in Markov Decision Problems
نویسندگان
چکیده
This paper considers undiscounted Markov Decision Problems. For the general multichain case, we obtain necessary and sufficient conditions which guarantee that the maximal total expected reward for a planning horizon of n epochs minus n times the long run average expected reward has a finite limit as n -* oo for each initial state and each final reward vector. In addition, we obtain a characterization of the chain and periodicity structure of the set of one-step and J-step maximal gain policies. Finally, we discuss the asymptotic properties of the undiscounted value-iteration method.
منابع مشابه
A K - step look - ahead analysis of Value Iteration algorithms for Markov decision processes
We introduce and analyze a general look-ahead approach for Value Iteration Algorithms used in solving Lroth discounted and undiscounted Markov decision processes. This approach, based on the value-oriented concept interwoven with multiple adaptive relaxation factors, leads to accelcrating proccdures rvhich perform better than the separate use of either the concept of vaiue oriented or of relaxa...
متن کاملGeneric rank-one corrections for value iteration in Markovian decision problems
Given a linear iteration of the form x := F (x), we consider modified versions of the form x := F (x + γd), where d is a fixed direction, and γ is chosen to minimize the norm of the residual ‖x + γd − F (x + γd)‖. We propose ways to choose d so that the convergence rate of the modified iteration is governed by the subdominant eigenvalue of the original. In the special case where F relates to a ...
متن کاملAccelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملReachability analysis of uncertain systems using bounded-parameter Markov decision processes
Verification of reachability properties for probabilistic systems is usually based on variants of Markov processes. Current methods assume an exact model of the dynamic behavior and are not suitable for realistic systems that operate in the presence of uncertainty and variability. This research note extends existing methods for Bounded-parameter Markov Decision Processes (BMDPs) to solve the re...
متن کاملRegular Policies in Abstract Dynamic Programming
We consider challenging dynamic programming models where the associated Bellman equation, and the value and policy iteration algorithms commonly exhibit complex and even pathological behavior. Our analysis is based on the new notion of regular policies. These are policies that are well-behaved with respect to value and policy iteration, and are patterned after proper policies, which are central...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Math. Oper. Res.
دوره 2 شماره
صفحات -
تاریخ انتشار 1977